Method for 2D and 3D images capturing, representation, processing and compression

ABSTRACT

A process for producing a compressed representation of 2D and 3D images. The image is represented in a compressed form by approximating regions with slowly changing brightness or color, by a background, formed by low degree polynomials. Fast brightness or color changes are represented by special models, including local models and curvilinear structures. Visual adjacency relations between the models are identified, the background partition represents these adjacency relations, curvilinear structures are approximated by spline functions. The three-dimensional image is represented by producing one or several compressed images of the scene from different positions, in which a depth value is associated to each model. A view of the scene from any prescribed point is produced by a geometric processing of these compressed data.

FIELD OF THE INVENTION

This invention relates to apparatus and method for representing2-dimensional and 3-dimensional scenes and images in a compressed form,particularly, but not exclusively, for the purpose of storing and/ortransmitting the compressed data and subsequently reconstructing thepicture or the 3D-scene, as seen from any prescribed point, in a manneras faithful as possible.

BACKGROUND OF THE INVENTION

The representation of various objects by data compression is a problemwith which the art has been increasingly occupied in recent times. Asfar as the usual images and videosequences are concerned, a backgroundof the present invention is described in the Israeli patent applicationIL 103389 (the priority of which is claimed in U.S. Pat. No. 5,510,838).Representation of three-dimensional scenes and objects, which becomesincreasingly important in recent times, requires imcomparably biggeramounts of information. Also fast representation and rendering ofthree-dimensional scenes as seen from an interactively prescribed point,presents difficult computational problems. The invention presents amethod for an automatic capturing and highly compressed representationof 2D and 3D scenes and images with a subsequent fast rendering.

SUMMARY OF THE INVENTION

A. Main applications of the invention

1. 3D-itineraries

A 3D-itinerary is a compact representation of a chain of 3D-scenes alonga certain camera trajectory. The user can interactively create (in realtime on a PC or a comparable platform) a photorealistic quality imagesof the represented scenes, as seen from any prescribed point in avicinity of the initial trajectory. In particular, the user can "fly" inreal time inside the represented space.

The product consists of two parts:

1. A software package (and/or a dedicated board) for an off-linecreation of 3D-iteneraries. An input for a preparation of a 3D-itineraryis a usual videosequence, showing the desired 3D-scenes from a certaincamera path. Then the package completely automatically creates thecorresponding 3D-itinerary (in minutes of a software processing for onescene).

2. A simple PC (or comparable) software package for real timeinteractive "traveling" along the 3D-itinerary.

Our 3D-iteneraries are strongly compressed, and the data volume of acomplete 3D-representation of a certain scene (which allows one to seeif from any prescribed point), is comparable with the data volume of asingle (noncompressed) 2D-image of the same scene.

The main proposed applications include computer assisted training,advertising, computer games and other multimedia applications.

It is important to stress that the 2D-images, reconstructed from a3D-itinerary, faithfully represent the scene, so creation of3D-iteneraries can be alternatively called "3D-compression". Inapplications, where the fidelity to the original is less important,while the high quality of the images must be, of course, preserved (likein computer games imaginary) much more compact virtual 3D-images andvirtual 3D-iteneraries can be used (see below).

2. Virtual 3D-images

A virtual 3d-image is a highly compressed representation of a virtual3D-scene, created from one or several high quality still images byspecial interactive tools. The end-user interactively produces in realtime photorealistic quality images of this scene from any desired point.These images are not authentic to the real views of a certain 3D-scene,but they provide a complete visual illusion of the required motioninside the scene.

The product consists of two parts:

1. A toolkit for an interactive creation of virtual 3D-images. An input,consists of one or several still images. The toolkit allows the user tointeractively create a 3D-structure on these images, to superimposethem, to supply "invisible parts" etc. In many aspects our tools aresimilar (and in fact, include) the standard toll used in Pre-Press andDesk-Top Publishing image processing.

2. A software package for images reproduction, identical to the package,used for 3D-iteneraries.

Virtual 3D-images are compressed to 1/10 up to 1/50 of the volume of theusual still image. They can be combined into virtual 3D-iteneraries,similar to the complete 3D-iteneraries, described above. However,virtual 3D-iteneraries are compressed to a much smaller data volume,they do not require detailed videosequences of the scene for theirpreparation, and they provide a complete visual illusion of the motioninside the represented space. A "true" 3D-data, created automatically byour package from a videosequence, can be incorporated into a virtual3D-image (or a virtual 3D-itinerary) together with interactivelyproduced parts.

The main proposed applications include computer games, advertising,computer assisted training.

3. Combined 3D-photorealistic data

If a true 3D-structure of a certain scene is available, as well as itsphotorealistic image, a combined structure can be created, based on ourrepresentation of the image and (in some cases) of the geometric data aswell. This combined data is strongly compressed, and it allows for afast photorealistic rendering (ray tracing, illumination effects, etc.)And for a fast producing a view from any prescribed position.

One important case is a combination of a digital terrain mapping (DTM)with an air photography. In this case a photorealistic 3D-terrain modelis created with the DTM data compressed approximately 1:10 and the(black and white) image compressed approximately 1:20 (at the highestquality) . This model allows for a very fasts production of the terrainimage from any prescribed point.

The product consists of a software package for an off-line creation of acombined data, and a simple PC software package for a real timephotorealistic rendering and producing a view from any prescribedposition.

The fact that our combined 3D-photorealistic structure is stronglycompressed, allows one to cover much wider area with the same datavolume. All the image rendering operations are performed on compresseddata, and as a result these operations are very fast.

The main proposed applications include moving maps, databases for flightsimulators, other kinds of computer assisted training, advertizing,computer games etc.

4. Images analysis

Our image representation by itself provides a highly detailed low-levelimage analysis, which forms a basis for a subsequent patterns andtextures extraction, image segmentation, etc.

The product consists of a software package, containing a developed setof interactive tools for a visualization and a detailed analysis ofvarious image features. In particular, the package contains tools for anextraction of high-level image patterns.

5. Image compression

The product consists of software packages (and/or dedicated boards) fora fast compression of still images and video sequences, and a simple PCsoftware package for decompression and image manipulation.

Still images compression provides a fast compression and decompressionof various types of still images, with a compression-to-quality curvebetter than of the standard compression methods. Allows for imageprocessing on compressed data. Allows for an adjustment to special typesof images and to special quality requirements. In particular, someprescribed types of visual details can be preserved on any compressionlevel.

Videosequences compression provides a very low bit rate high qualitycompression. Shares the above mentioned advantages of the stillcompression.

B. Broad summary of the invention

1. Representation of still images

The Normal Forms (NF) representation of a picture is composed of twoparts:

a. Large homogeneous areas (background) which can be easily approximatedby (and hence represented as) low degree polynomials.

b. Regions with more complicated structure (e.g. involving edges,ridges, low scale details or other complicated features) which arecaptured by special mathematical models, called Normal Forms. Thesemathematical models serve as building blocks from which the image isconstructed.

The following specific features provide a high efficiency of ourrepresentation:

1. The scale of our models is small enough (typically, few pixels) toprovide a high fidelity image reconstruction.

2. Image details, captured by our models, are defined in simplemathematical terms, thus allowing for an effective and computationallyinexpensive detection.

3. There is a very small number of types of models.

Each model of a certain type is completely characterized by a fewcontinuous parameters, allowing for a precise adjustment to an imagestructure. We call these models Normal Forms, since by their nature theyare very close to normal forms of singularities, known in mathematicalSingularity theory.

While the representation of the background shares common features withthe conventional methods (improving them due to the fact that onlylarge-scale details are to be captured) the introduction of the NormalForms constitutes an entirely new step which, essentially, replacespixels (as building blocks of a picture) by the elements of a (normally)much coarser scale with no loss of visual information.

On the other hand, NF representation is a complete decomposition of theimage into (low scale) objects together with a complete description ofthe visual adjacency relations between those objects.

This allows us to trace the motion of these objects in neighboringvideoframes, to detect their depth and ultimately to make them buildingblocks of a 3D-image representation.

It must be stressed that the NF representation is completely based on alow level image analysis. Its construction is local and mathematicallystraightforward and does not involve semantic analysis. As a result,obtaining NF representation is a stable and computationally inexpensiveprocedure.

The power of our representation can be illustrated by the followingfacts:

1. Passing to NF representation implies no visual degradation of theimage, and so is visually lossless. On the other hand, the volume of thedata involved in it is a fraction of the data volume of the originalimage. Thus the NF representation by itself provides an order ofmagnitude image compression. Combined with an additional quantization ofthe parameters, according to their psychovisual significance, and with alossless entropy compression, the NF representation provides an imagecompression technology, whose compression-to-quality curve is superiorto that of the standard compression techniques.

2. Any image processing operation can be performed on NF's. Indeed, ourmodels--Normal Forms, and their parameters have a very simple visualmeaning. As a result, any operation defined in terms of a desired visualeffect, can be interpreted as a simple operation on the parameters ofour models, i.e. as an operation on a NF representation of the image.

3. The objects, constituting the NF representation, behave in a coherentway in videosequences. This allows for a motion detection much moreefficient than in conventional schemes. Indeed, we can capture a motionof our objects in different directions, various "generalized motions"like defocussing of the camera etc. On this base a very low bit ratevideocompression scheme has been developed.

2. Representation of 3D-scenes: Local 3D-Images

1. Depth detection.

Coherent behavior of our objects on neighboring videoframes (or on closestill images of the same scene) allows for an effective depth detectionin 3D-scenes: "the same" objects can be identified on 2D0-images, takenfrom nearby camera positions, and then the depth of each object can beeasily computed. Since the Normal Forms represent faithfully the visualcontent of the image, the depth is faithfully represented at any part ofthe image, and not only for isolated details.

2. Depth representation.

The depth (detected as described in 1) can be associated to each of ourobjects (NF's) , thus creating a local 3D-image, which represents the3D-scene, as seen from a vicinity of the initial camera position. Thelocal 3D-images of the same scene, taken from different points, can becombined to form a global 3D-structure, representing the scene as seenfrom any prescribed point.

Notice that the local 3D-extension of an image, represented by NF's,causes only a minor increase in the data volume, so our local 3D-imagesare still compressed to a small fraction of the data of the usual 2Dstill images.

3. Data blending.

Various forms of the NF organization (that of still images, ofvideosequences, virtual 3D-images etc.) are completely compatible andcan be combined with one another or with additional structures, likegeometric 3D-models, DTM (Digital Terrain Mapping), additional sensorsdata etc.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates representation of segments and edge elements byelements of the third-degree curve.

FIGS. 2a-2b show the adjacency of the edge, coupled with a ridge, andthe free continuation of this edge.

FIGS. 2c-2e show the adjacency of the edge or ridge, joining anotheredge or ridge at one of its inner points.

FIGS. 2f-2g show the adjacency of several edges or ridges, emanatingfrom the same point.

FIG. 3 shows ridges with two adjacent edges on both sides. Such ridgesare distinguished from the rest of the ridges, and are called theN3-models. Temporarily all the information about the central ridge andtwo adjacent edges is stored for these models. Later it is replaced bythe profile information.

FIG. 3b shows two ridges or N3-models (or their parts), adjacent to thesame edge. This relation (and the corresponding subdivision) isexplicitly memorized.

FIG. 4 shows a chain of adjacency relations between several models.

FIG. 5 shows joints of the types N3-E-E-, N3-E, N3-N3, N3-R, R-N3,N3-R-R.

FIG. 6 shows a "second order joint", representing several elementaryjoints.

FIG. 7a shows the profile of the N3-model.

FIG. 7b shows the profile of the ridge. It is obtained by substitutinginto the N3-profile some specific values of parameters.

FIG. 7c shows the profile of the edge. For an appropriate choice ofparameters, it coincides with the margin part of the N3-model profile.

FIG. 8 illustrates an experimental profile determination.

FIG. 9 and FIG. 10 illustrate different types of model completion andsimplification.

FIG. 11 illustrates an effect of incorrect edge identification.

FIG. 12 shows how the ridges and the N3-models subdivide the backgroundlocally into three parts: the model support and two side-parts. Theseparating lines are the central lines of the slope edges of the model.

FIG. 13 shows that the parts of the background partition meet oneanother in a coherent way at all types of joints, as described herein.

FIG. 14 illustrates an approximation of the geometry by parabolicpieces.

FIG. 15a shows a depth parameter, representing the depth of thosebackground parts, bounded by an edge, which are closer to the viewer.

FIG. 15b shows ridge depth parameters: one representing the depth of thecentral line of the ridge, and an indication of the side (or sides)where the background depth differs from the central line's depth.

FIG. 15c shows an intersection of two models with different depthvalues.

FIG. 16 shows transparent and non-transparent regions of the background.

FIG. 17a shows a "logarithmic" normal form. It contains an edge, whichends inside the considered cell, and a background, whose depth iscontinuous everywhere except the edge, and jumps along the edge.

FIG. 17b shows a "loop" normal form. It contains an edge, which makes aloop inside the considered cell, crossing itself on a different depth.

FIG. 18a shows a "fold" normal form.

FIG. 18b shows a "cusp" normal form.

FIG. 19 and FIG. 20 illustrate rendering of normal forms from a viewerposition, different from the original one.

FIG. 21 illustrates depth sorting (z-buffer) and final image rendering.

I. Two-dimensional image representation

The following improvements can be introduced into the process of imagecompression, as described in Israeli Patent Application IL 103389 (thepriority of which is claimed in U.S. Pat. No. 5,510,838).

1. Approximation scales

The size of 4×4 pixels of the approximation block has been suggested inIL 103389 as a preferred one. However, to improve capturing a smalldetails, the scale of 3×3 pixels blocks can be used. In particular, thismeans that the central points of the blocks coincide now with the imagepixels (and not with the central points between the pixels, as for 4×4blocks). Moreover, a weighted approximation on these blocks can be used,with a Gaussian-type weight, concentrating on the center of the block.

On the other hand, the cubic approximation, which is used in IL 103389to construct the "edge elements", is ill-defined on the 3×3 pixelsblocks. Consequently, it is performed on 5×5 pixels blocks, also with aGaussian weighting. Thus on the initial stage of the blockapproximation, the approximate on two scales is computed: a quadraticapproximation on 3×3 pixels block and a cubic approximation on 5×5pixels blocks.

2. Third-order approximation in constructing segments, edge elements andadjacency relations between them

In the implementation of the compression method, described in IL 103389,the segments are constructed based exclusively on the quadraticapproximation, while in edge elements construction the cubicapproximation is used only partly, to determine the state of the ends ofthe edge elements.

The experiments show that the cubic polynomials, approximating thepicture on 5×5 pixels blocks, capture in a stable way some importantimage features, which cannot be captured by quadratic polynomials:

(i) Curvature of the ridges and the edges.

(ii) The "corners" of the ridges and the edges.

(iii) Smaller scale geometry of the ridges and the edges.

This accurate geometric information is very important for a betterapproximation of the image on one side, and for distinguishing betweenvarious kinds of the image texture, on the other. In particular, itallows one to eliminate the edges, constructed in highly textured areas,which are not necessary for a faithful image representation.

Another important advantage of the third order information is that itallows one to determine the position and the direction of theconstructed elements with a much higher accuracy (an improvement may befrom a half-pixel to a one-tenth pixel) . This accuracy improvement isvery important in two aspects:

(i) The ridge and edge component, constructed from the segments and edgeelements on later stages, obtain a simpler (and mathematically,smoother) geometric shape. Consequently, they can be approximated andencoded with less information stored.

(ii) The adjacency relations between segments and edge elements areestablished in a more coherent way, which improves further modelconstruction.

Finally, if the third order-approximation of the image is used, thesegments and the edge elements can be represented not by the first-orderelements, as in IL 103389, but by the third-order ones. This means thatboth segments and edge elements are now represented by an element of thethird-degree curve at their central points (see FIG. 1).

A much smaller number of such elements can be used to faithfullyrepresent a ridge (edge) component. This strongly simplifies thecomputation and significantly increases the computational stability ofthe algorithm. Mathematically, the required third-order representationof the segments and the edge elements is obtained as follows:

For z=ƒ(x,y)--the third order polynomial, representing the brightnessfunction, at the point where the usual segment has been detected, asdescribed in IL 103389, the third-order segment in is the Taylorpolynomial of degree 3 of the solution of a system of differentialequations x=∂ƒ/∂x, y=∂ƒ/∂y passing through the central point of thesegment. The third order edge element is given by the Taylor polynomialof degree 3 of the equilevel curve f(x,y)=const, passing through thecentral point of the usual edge element.

It should be stressed that the third-order-polynomial approximations ofa typical image are highly unstable, if considered in an unstructuredway. This instability presents one of the main difficulties in using thethird-order information. The computation scheme, presented above, showshow this difficulty can be settled: the third-order polynomials are usedonly at the points, classified on the base of the first and thesecond-order information, and the only questions that are asked are theones whose answers are insensitive to the possible noise.

3. Additional types of adjacency relations

In IL 103389 some basic adjacency relations between segments and edgeelements have been constructed. However, these relations do not coverall the possible visual adjacencies. In order to provide a faithfulimage representation, the models constructed must represent faithfullyalso all the visual adjacency between them. Therefore, the followingadditional adjacency relations can be taken into account:

(i) The adjacency of the edge, coupled with a ridge, and the freecontinuation of this edge (FIGS. 2a, b).

(ii) The adjacency of the edge or ridge, joining another edge or ridgeat one of its inner points (FIGS. 2c,d,e).

(iii) The adjacency of several edges or ridges, emanating from the samepoint (FIGS. 2f,g). These new adjacencies are constructed in the sameway as the old ones, on the base of a geometric proximity between thesegments and the edge elements. The third-order information, describedabove, is very important in this construction.

4. Segments classification

Using the above adjacencies, the segments can be subdivided into"strong" and "weak", according to their position on the central line ofthe ridge or on its margins. This classification is not always stable,so it is actually done by associating to each segment a number between 0and 1--its "weakness". This number is later used in the backgroundconstruction. This completes the list of the main improvements on thelevel of the initial detection.

On the level of model construction, several additional improvements canbe introduced.

5. Adjacency relations between the models

In IL 103389 the main adjacency (or adherency) relations used are thosebetween the ridge and the edges, which form the slopes of this ridge(or, dually, between the edge and its margin ridges). These relationsare used in IL 103389 to drop the redundant information and to simplifythe models constructed. However, the adjacency relations, used in IL103389, form only a part of visually important proximity relationsbetween the constructed models. In order to faithfully represent theimage, all the visible adjacency relations between the models must beincluded into representation. This is done as follows:

(i) Ridges with two adjacent edges on both sides are distinguished fromthe rest of the ridges, and are called the N3-models. Temporarily allthe information about the central ridge and two adjacent edges is storedfor these models. Later it is replaced by the profile information, asdescribed below (FIG. 3a).

(ii) If two ridges or N3-models (or their parts) are adjacent to thesame edge, this relation (and the corresponding subdivision) isexplicitly memorized (FIG. 3b).

(iii) The local adjacency of the types 3i, ii, iii above are interpretedas the corresponding adjacency relations between the models (FIG. 3c).

As a result, the models constructed are joined to one another at certainpoints, called "joints". These joints form an important element of themodels structure. Consequently, the models (ridges, edges, N-3 models)are organized into chains (called "graphics"). The vertices of thesegraphs are joints, while the segments are edges, ridges and N-3 models).

6. The structure of the joints

As is clear from the description above, joints play an organizing rolein the models structure. Respectively, they are construed in a way thatallows to keep and process all the relevant information. The followingtypes of joints are used: N3-E-E, N3-E, N3-E-N3, E-R, E-R-R, N3, . . .,N3, E, . . . ,E (see FIG. 5). N3 models can be replaced by ridges.

The following information is memorized at each joint:

(i) Its coordinates.

(ii) The types and the endpoint coordinates of the models entering thisjoint.

(iii) The profiles of the entering models.

All these data at a given joint may happen to be highly redundant (as,for example, the profiles of the entering models, the mutual positionsof the endpoints, and the directions of the models). This redundancy isused on the quantization and encoding stage to increase the compression.In many cases, several "elementary joints", as described above, form aconnection with a highly interrelated geometry (see FIG. 6) . In thiscase, a "second order joint", representing this connection and using itsdata redundancy, can be constructed.

7. The profiles of the models

The profiles of the ridges and the edges, as constructed in IL 103389,faithfully represent most of the features of typical images. However,new adjacency relations, introduced above, require an appropriateprofile adjustment at the joints. Respectively, the mathematicalexpressions, representing the profiles of the models, are chosen tosatisfy the following requirements:

(i) The profile of the N3-model contains, in addition to the parametersdescribed in IL 103389, the "interior width" parameter (FIG. 7a).

(ii) The ridge profile is obtained by substituting into the N3-profilesome specific values of parameters (FIG. 7b).

(iii) The edge profile, for an appropriate choice of parameters,coincides with the margin part of the N3-model profile (FIG. 7c).

This choice of profiles allows for a continuous profile adjustment atthe joints.

8. An experimental profile determination

Experiments show that the optimal choice of profiles depends strongly onthe image to be represented (in particular, on the equipment used inthis image production, on the digitalization performed, etc.). Thischoice turns out to be extremely important for a faithful imagerepresentation. The choice, described in IL 103389, is appropriate formost of the usual images. However, in some cases it may be desirable tofind and use specific profiles for each image compressed (an additionalinformation to be stored is negligible).

This can be done as follows: Let l be a (ridge or edge) component, forwhich the profile is to be experimentally determined. For each pixel onthe distance of several pixels to l, the value of the image brightnessat this pixel is represented as the coordinate y value (in a planecoordinate x,y), while the x value is the distance of the pixel to thecurve l (see FIG. 8). The brightness values for all the neighboring to lpixels, represented in this form, cover a smooth curve in the(x,y)-plane, which is the curve of the experimental profile for themodel l. This curve can be later approximated by a polynomial, splines,etc., and the resulting approximation can be used in a compressed datarepresentation.

9. Model completion and simplification

The structure of the model graphs and of their profiles, as describedabove, allows one in many cases to simplify significantly the modelstructure without compromising the fidelity of the image representation.The following completion operations are used:

(i) N3-model completion. The sequence N3-N-N3 or N3-E-E-N3, completed toa continuous N3 model (FIG. 9a). The N3 profile described above allowsfor a faithful representation of the image.

(ii) Edge completion. The sequence E-R-E or E-R-R-E is completed to acontinuous edge model (FIG. 9b).

In both cases the gaps in N3 or edge model can be completed exactly inthe same way (FIG. 9c).

(iii) N3 reconstruction. A sequence of edges and adjacent ridges can betransformed into a continuous N3 model, as shown in FIG. 9d.

The following additional simplifications can be performed:

(i) A N3 model or a ridge, adjacent from both sides to another N3 orridge, can be dropped (FIG. 10a).

(ii) An edge, adjacent from both sides to ridges (or other ridges) canbe dropped (FIG. 10b).

10. Background construction

Several versions of the background construction have been described inIL 103389. Usually they provide a good quality image reconstructions.However, in many cases further improvements must be introduced, toguarantee a high compression and a high fidelity image reconstructionfor all the range of possible images. The main difficulty in thebackground construction can be described as follows: The backgroundrepresents the part of the image which is not captured by the models. Itis characterized by a slow gradual change of the brightness of theimage. The models-edges, ridges and N3-models separate the image intotwo parts: one captured and represented by these models themselves, andthe other represented by the background. This second part may consist ofmany separate connected pieces. Indeed, the brightness of the image canjump discontinuously when passing from one background component toanother.

A mistake in this background partition usually results in a seriousreconstruction error. For example, if a certain edge, separating twobackground regions of sharply different brightness, has been detectedwith a gap, the background will be represented by one connected region,instead of two, and the brightness values will be smudged over theboundary between the regions (see FIG. 11). The same problem will becaused by any gap between two models, which are visually adjacent. Thus,to provide a correct background reconstruction, all the visualadjacencies between the models must be detected and explicitlyrepresented. This detection and representation are described in Sections5 and 6 above.

11. Background partition

The background partition is constructed according to the followingrules:

(i) Each edge subdivides the background along the central line of thisedge into two parts (locally).

(ii) The ridges and the N3-models subdivide the background locally intothree parts: the model support and two side-parts. The separating linesare the central lines of the slope edges of the model (FIG. 12).

An important feature of this partition is that its parts meet oneanother in a coherent way at all types of joints, described in Section 6above (see FIG. 13).

12. Background representation

As a partition of the background is completed, the representation of thegray level values of the image over the background regions is achievedas follows:

(i) A background cell size is chosen. Usually it is between 4 and 16pixels

(ii) For each background cell C_(i), and for each background regionB_(j), intersecting this cell, a low degree polynomial is computed,approximating the background values on C_(i) ∩ B_(j). The degree of theapproximating polynomial is usually 0, 1 or 2. For 6×6 and 8×8background cells, zero degree polynomials usually provide a faithfulimage reconstruction.

(iii) For each background cell C_(i), and for each background regionB_(j), a number of background representing points is chosen in C_(i) ∩B_(j), in a distance of 1/4 of a cellsize from one another.

(iv) A piramidal background representation can be chosen to provide ahigher compression. For a background cell size l, the background isfirst represented (according to i and ii) on the cells of size 2l. Thenthe background data on the size l are stored as the difference with thecorresponding data on 2l-scale. This construction can be repeatedseveral times, providing a piramidal representation of the backgrounddata.

It is important to stress that the above representation can be arrangedin such a way that the intersections C_(i) ∩ B_(j) (which are irrelevantto the represented image: the background regions B_(j) are defined in acompletely invariant way, with no reference to the cells C_(i)) are notexplicitly computed. Instead, the approximating polynomials and thebackground reference points are constructed using a certain weightedapproximation in a neighborhood of each cell C_(i). The weights in thisapproximation (as well as in ii above) are determined by the type ofbasic elements, detected in each pixel of the background region: the"empty points" (the set A₁ in the notations of IL 103389 get the highestweight, while the "strong" segments get the lowest weight.

13. Approximation of the geometry

An approximation of the ridges and the edges is based on polygonallines. Such an approximation usually provides the required compressionand reconstruction quality, but in some cases it can cause someundesirable visual effects (staircase) on smoothly curved lines. Thefollowing improvements overcome this difficulty:

(i) The curves are approximated by the pieces of quadratic parabolas.

(ii) The highly curved regions of the approximated curves are detectedin advance (as described in Section 2 above) These regions areapproximated first.

(iii) The rest of the curve is approximated by the parabola pieces,starting from the ends of the curved regions, according to thedescription given in IL 103389, step 7 (see FIG. 14). Notice that themean square approximation, as well as various forms of the splineapproximation, can be used at this stage.

The smooth approximation, as described above, usually contain a seriousdata redundancy. In order to increase compression, this redundancy canbe removed in different ways. In particular, the direction and thecurvature of each parabolic segment can be used as the prediction forthe next one, etc.

14. Scales hierarchy

Some well-known properties of human visual perception prescribe certainprinciples of data capturing, representation and quantization to thecompression scheme presented. In particular, small color errors arebetter detectable on big areas. In a geometry representation, a relativeerror is visually important (with respect to the size of the representedmodels).

Respectively, one can formulate the following principles:

(i) The values which are to be represented on a certain scale must becaptured by averaging on the same scale. This concerns the profilevalues of the ridges, edges and N3-models, as well as "hills". As far asthe background is concerned, an average brightness value of each colormust be preserved with a relatively high accuracy.

(ii) The brightness values of curvilinear models can be quantized to amuch higher extent than the background values (and the brightness valuesof "hills" can be quantized higher than those of curvilinear models).

(iii) A relative representation of lengths must be used in a geometryquantization (i.e. shorter lengths must be stored with a higher absoluteaccuracy).

(iv) In an incremental data representation, the smaller scale data mustbe always related to the bigger scale data. For example, the marginvalues of the curvilinear models must be quantized around thecorresponding background values, and not vice versa.

15. Capturing the finest scale

Although the suggested representation provides a visually faithful imagereconstruction, in some applications it can be important to cover allthe compression-quality curve, starting with the lossless compression.In order to provide such a covering, the following approach can be used:

(i) The image is compressed by the proposed method up to a desiredcompression.

(ii) A difference between the original image and the reconstructed oneis formed.

(iii) This difference is compressed by one the methods, providing thefull compression-quality curve covering (for example, based on DCT orwavelets).

Such a combination provides a full covering of the compression-qualitycurve. As the experiments show, the resulting compression for eachrequired quality is higher than that achieved by the additional methodby itself.

If the local basis method is used (such as some wavelets-based ones),the corresponding data can be included into the normal formrepresentation. In particular, all the specific features of thepresented method (such as operations on compressed data, motiondetection, etc., see IL 103389 are preserved in the combined method.

II. Three-dimensional data representation

1. Virtual 3D-image structure

A virtual 3D-image is a (highly compressed) photorealistic qualityrepresentation of a certain 3D-scene, as seen from a vicinity of theinitial view-point. A virtual 3D-image serves as an input for the3D-viewer, which allows the user to interactively produce an image ofthe scene, as seen from any prescribed point.

The representation of virtual 3D-images comprises the followingelements:

(i) Basic models, which are exactly the same as in the 2D-imagescompression, described above.

(ii) Each of these models has additional depth parameters.

These depth parameters are associated with the profiles of the models,and are represented exactly in the same way as the brightnessparameters.

More specifically, for each of the models, the following depthparameters are used:

Edge.

(a) One depth parameter, representing the depth of those backgroundparts, bounded by this edge, which are closer to the viewer (FIG. 15a).

(b) An indication of the side of the edge, on which the background depthis greater than the edge's depth (or an indication that the depth valueson both sides are the same).

Ridge or N3-model.

(a) One depth parameter, representing the depth of the central line ofthe ridge.

(b) An indication of the side (or sides) where the background depthdiffers from the central line's depth (FIG. 15b).

Hill or hollow.

(a) A central point depth.

(b) An indication of one of two possibilities: either the backgrounddepth is equal to, or it differs from the central depth.

An important feature of the above models, as representing the virtual3D-image, is that different models can intersect one another, assumingthat at the intersection points they have different depth values (FIG.15c).

Background.

The depth of the background is represented in the same way as thebackground brightness (see I, 10, 11, 12 above), with the following maindifference: over each background cell the background of the virtual3D-image may have several layers (or branches). More accurately, locallythe representation of virtual 3D-images is organized as follows:

(a) Over each background cell there are several background branches,each with a different depth range.

(b) Among these branches there are regular and singular ones. A regularbranch has exactly the same form as described in I, 10, 11, 12 above,i.e. it contains the models, passing over the considered cell with thedepth corresponding to the considered layer. It contains also thebackground partition, constructed exactly as in I.11. To each region ofthis partition a depth value is associated, exactly as the brightnessvalue in I.12. However, there is an important distinction: some regionsof the partition can be associated with a special depth value, called"transparent". The "transparent" depth value is associated with thoseregions, whose bounding edges or N3-models have on the correspondingside an indication of the bigger background depth than their centralline depth.

Finally, the non transparent regions also get a brightness value, asdescribed in I.12. The transparent regions do not carry any brightness(see FIG. 16).

(c) A singular layer contains branching points of the depth. Theselayers, in addition to the structures described in (b), contain specialnormal forms, representing the depth branching.

The following normal forms are used:

(i) "Logarithmic" normal form. It contains an edge, which ends insidethe considered cell, and a background, whose depth is continuouseverywhere except the edge, and jumps along the edge (FIG. 17a).

(ii) "Loop" normal form. It contains an edge, which makes a loop insidethe considered cell, crossing itself on a different depth (FIG. 17b).

(iii) Fold normal form. This is one of the Whitney stable singularitiesof the projection of a 3D-surface on the screen plane. It is representedby an edge, two background layers on one side of the edge, and atransparent region on another side (FIG. 18a).

(iv) Cusp normal form. This is the second Whitney singularity. It isrepresented by an edge (with a cusp-type singularity) and a backgroundlayer, depth is univalued on one side of the edge, and forms a triplecovering of the other side (FIG. 18b). Both the types (iii) and (iv) ofsingularities and their normal forms are well-known in the mathematicalliterature (see V. I. Arnold, Singularity Theory, Springer Verlag, 1998and Y. Elihai, Flexible High Order Discretization, Ph.D Thesis,Beer-Sheva, 1995).

While the normal forms of types (i) and (ii) arise naturally in images,containing complicated edge geometry (leaves of the tree, etc.), thenormal forms (iii) and (iv) usually appear as the visual parts of theboundaries of smooth 3D-bodies. These last two types also play a centralrole in a rendering of true three-dimensional models, and in particularin their transforming into the structure of a virtual 3D-image (seesection 7 below). The quantization and the compression of the virtual3D-image data is performed in exactly the same way as that of stillimages (see I above). This completes the description of the structure ofa virtual 3D-image.

2. Rendering of virtual 3D-images

(i) The structure of a virtual 3D-image, as described in II.1 above, isadjusted to a certain fixed camera (or viewer) position. Indeed, thebackground cells, being the cells on the screen, correspond to theangular coordinates with respect to the camera, while the depth is thedistance of the object from the camera.

If another viewer position is given, the first step in an imagerendering is the construction of a completely similar structure of avirtual 3D-image, but associated with a new viewer position (FIG. 20).This is done in several steps:

(a) A transformation ψ is constructed, which transforms the screencoordinates and the depth of each point with respect to the initialviewer position, into the screen coordinates and the depth with respectto the new viewer position. (A construction of the transformation ψ iswell-known in mathematics and imaging literature; see D. Hilbert, S.Cohn-Vossen, Anschauliche Geometrie, Berlin, 1932.)

(b) The transformation ψ is applied to each model in the originalrepresentation of the virtual 3D-image. More accurately, it is appliedto each point where the geometry or the profile of the model arerepresented, as well as to parabolic segments and the widths,representing the geometry of the model. ψ is applied according to thescreen coordinates of these objects and the depth, stored in theseobjects, as described in II.1. The brightness values of the modelsremain unchanged. As a result, each model is represented in a screencoordinates and a depth, corresponding to a new viewer position. Thesame transformation is also applied to the background representingpoints (see I.12).

(c) For each cell of a viewer's screen, the models, which intersect aneighborhood of this cell, are found. These models are subdivided intoseveral groups, each corresponding to one depth layer over theconsidered cell.

(d) On each layer the background partition is performed as described inI.11. For each background region, its depth and the brightness areconstructed by averaging of the corresponding values at the images underthe transformation ψ of the original background representing points.

This completes a transformation of the virtual 3D-structure according toa new viewer position.

(ii) Depth sorting (z-buffer) and final image rendering.

In order to produce a final image, as seen from the new viewer position,the virtual 3D-image representation described above must be sorted overeach background cell, in order to find visible parts of the image. Thissorting is done as follows:

The layer which is the closest to the viewer is considered. Thebackground partition regions of this layer, which are not transparent,enter with their brightness values into the final image representation.The transparent regions are further subdivided according to thepartition of the second (from the viewer) layer. In this subdivision,new joints are constructed at the crossings of models from differentlayers. The nontransparent regions enter the final representation, whilethe transparent ones are further processed according to the third layer,etc. (FIG. 21).

Since on each subdivision step new joints are constructed at each modelintersection point, the resulting structure is identical to thecompressed still image structure, described in I above. It is finallyexpanded to a bitmap by a fast expansion software (or a dedicatedhardware) . Notice that all the rendering operations, except the verylast bitmap expansion step, are performed on a compressed data, so thatthey are computationally inexpensive. It is important to stress alsothat an apriori bound on the maximal cell representation complexity isimposed (approximately the same as for still images) . The informationexceeding this bound is ignored. This completes a description of therendering of a virtual 3D-image in order to produce its view from aprescribed position.

3. Interactive creation of virtual 3D-images

An input for an interactive creation or a virtual 3D-image is one orseveral still images, videosequences, etc. The following operations areperformed to create a virtual 3D-image starting from a still image of acertain 3D-scene.

(i) Still image is compressed according to part I above.

(ii) The edges, ridges and N3-models form a partition of the image. Ifnecessary, this partition is completed or simplified, interactively.

(iii) On each part of the resulting image partition, a continuous depthfunction is introduced. This is done interactively, using a tool similarto "Photoshop". (The depth is interpreted as a "color", and the image is"painted" with the depth.) Notice that the depth function can jump alongthe partition models.

(iv) On this stage the depth created is automatically translated intomodel depth data of a virtual 3D-image, as described in II.1 above.

(v) The obscured parts of the image are automatically completed to aprescribed extent by continuation of the nonobscured models. Ifnecessary, the obscured parts can be interactively completed using allthe usual image processing tools. In particular, parts of other imagescan be inserted into the obscured parts.

This basic step, (i) to (v), can be combined with various additionalinteractive constructions. For example, two virtual 3D-images can besuperimposed with one another on different depths. Some objects from oneimage can be inserted into the second one, etc. The true depth data onvirtual 3D-images, as well as true 3D-models or synthetic 3D-objects,can also be interactively incorporated into a virtual 3D-structure.

4. Automatic creation of virtual 3D-images

Any known method of automatic depth detection, which produces a depthmap on the still image, can be combined with the procedure described in3 (iv) to produce completely automatically a virtual 3D-image. However,a detection process, described in part I above, can be transformed intoan efficient depth detection method, especially appropriate for thepurpose of constructing virtual 3D-images.

Assume that a videosequence, representing a 3D-scene from a known cameratrajectory, is given. Then

(i) On each frame, the initial detection steps are performed, and thebasic elements--segments and edge elements--are constructed, a describedin IL 103389 (and/or in I above).

(ii) One of the frames is specified as the reference one. For each basicelement on this frame, its motion is detected. This is done either bythe method described in IL 103389, section "motion estimation" (extendedsimilarly to edge elements) or by the following simple construction:Since the camera trajectory is known, the motion direction of eachelement is also known. Thus on the frame neighboring to the referenceone, the elements are found nearest to the reference frame elements,which are displaced in the known motion direction. If the mainparameters of the elements found (the direction, curvature, slope, etc.)are approximately the same as the ones of the initial element, it isconsidered as the result of the motion of the initial element.

As a result of a combination of these two methods, the motion isdetected for each element of the reference frame.

(iii) Knowing this motion and the camera position, and the trajectory,the depth is computed for each element, according to a well-knownformula (D. Hilbert, S. Cohn-Vossen, Anschauliche Geometrie, Berlin,1932). Thus the depth is associated to each basic element existing onthe reference frame.

(iv) On this stage the hidden parts of the scene are completed. This isdone as follows: The basic elements and their depths are computed foreach frame of the sequence. Then the transformation ψ , as described in2(i)(a) above, is applied to these elements, transforming them into thecorresponding elements on the reference frame. Notice that, as a result,some elements on the reference frame will appear several times, but alsothe elements invisible from the initial camera position will becompleted. Notice also that some points of the reference frame cancontain now several basic elements, pointing in different directions,each one on a different depth level.

(v) Now the construction of the components and the rest of the modelsconstruction is performed exactly as in part I above, with the followingdifference: the basic elements are connected in one component only iftheir depth changes gradually.

The depth value of the models is computed from the depth value of thebasic elements exactly in the same way as the brightness values (see IL103389). The depth of the background is constructed exactly as thebrightness value, by averaging the depth of local basic elements in thecorresponding region. This completes the description of the automaticproduction of virtual 3D-images.

5. A net of virtual 3D-images

Each virtual 3D-image represents a 3D-scene, as seen from the viewerposition in a vicinity of the original position (its angular size may beof 25°-30° and more) . Therefore, to represent a scene as seen from anyprescribed point, a net of interrelated virtual 3D-images can be used.Since each virtual 3D-image is normally compressed to a small fractionof a data volume of a corresponding still image, this net still forms avery compact representation of a 3D-scene. However, to provide acontinuous rendering of the scene, as the viewer position moves, therepresentations, obtained from each one of the virtual 3D-images in thenet, must be interpolated. This is done by identifying and interpolatingthe corresponding models on the neighboring virtual 3D-images in thenet. This procedure of the models identification and interpolation isdescribed in detail in IL 103389, "Compression of videosequences".

6. A true 3D-model representation

On the base of a virtual 3D-image structure, described above, amodel-based representation of the true 3D-structure of 3D-photorealisticscenes can be created. This representation is related to the well-knownvoxel representation exactly in the same way as the still images modelrepresentation, described in section I above, related to the originalpixel representation. In other words, the voxels (which are the volumeelements in the size of the pixel) are replaced by the models whosetypical scale is about 12 pixels. Experiments show that one can expectan average number of less than ten models in one scale-size cube, so anexpected compression (in comparison to the voxel representation) can beof order 12³ /10≅150.

The main assumptions in the construction of 3D-models are the same as inthe construction of 2D-models, described in part I above. They arerelated to the experimental fact that, to capture any image in avisually lossless form by models in a basic scale of the order of 10pixels, a small number of models suffices.

The following list represents the basic 3D-models used:

(i) "3D-edge", which is represented by a smooth surface, bounding aclosed part of the 3D-space.

(ii) "3D-ridge", which is represented by a smooth surface, separatingtwo open parts of the 3D-space.

(iii) "3D-wires", represented by a smooth 3D-curve, with a certainsmoothly changing transversal section.

(iv) "3D-patches", represented by 3D-bodies of the total size smallerthan the scale size.

(v) All the above models, with the piecewise smooth geometry.

(vi) All the 2D-models, as described in part I above.

These models can appear in a representation of a texture on 3D-surfaces.

The construction of 3D-models is performed as follows:

Virtual 3D-images of the processed 3D-scene are constructed, asdescribed in part II above, in such a way that any "open" (or "visible")part of the scene is represented at least in one of these images. Theelements of these images actually represent parts of the 3D-models to beconstructed. For example, the background with the depth data forms the"3D-edges" and "3D-ridges" above, the edges form either the geometricedges of these surfaces, or the singularities of their projection to thescreen plane, the ridges form "3D-wires". In this way the data fromseveral virtual 3D-images is combined to represent true 3D-models.

A true 3D-model representation obtained in this way can be rendered byproducing first a virtual 3D-image, corresponding to the givenviewpoint. This operation is described in more detail in the nextsection.

7. Constructing a virtual 3D-image from a true 3D-model

In many applications, a true 3D-model of a certain 3D-scene can beknown. For example, a Digital Terrain Mapping (DTM) of a certain terraincan be given together with its aerial photography. In this situation,the method described above allows one to use the compressedrepresentation of both the DTM and the aerial photography, to create acombined 3D-structure. Then a very fast rendering of this structure canbe obtained, as follows:

(i) A projection mapping P of the terrain to the screen is constructed.The mathematical parts of this construction, as well as the inversion ofP used below, is described in Y. Elihai, Flexible High OrderDiscretization, Ph.D. Thesis, Beer-Sheva, 1995.

(ii) A local and global inversion of this mapping P is constructed, asdescribed in Y. Elihai, Flexible High Order Discretization, Ph.D.Thesis, Beer-Sheva, 1995.

(iii) Singular curves of P are constructed, as described in Y. Elihai,Flexible High Order Discretization, Ph.D.

Thesis, Beer-Sheva, 1995.

(iv) A virtual 3D-image structure is created. In this structure, thelocal branches (see II, 1 above) are formed by the local branches ofP⁻¹, the fold and cusp singularities of P (see Y. Elihai, Flexible HighOrder Discretization, Ph.D. Thesis, Beer-Sheva, 1995) correspond to thefold and cusp normal forms in II.1, (iii) and (iv) above.

Each model of the compressed aerial photography enters the virtual3D-image structure, being transformed by P. This completes theconstruction of the virtual 3D-image from the true 3D-model. A furtherrendering is performed, as described in II.2 above. An importantproperty of the above process is that no "ray tracing" is applied.Instead, the direct projection of the texture models is constructed,which is much simpler computationally.

8. Concluding remarks

(a) The combined method data (see I.15 above) can be included into thevirtual 3D-images data structure exactly in the same way as the "hill"model above.

(b) Various additional methods for an automatic depth detection for theabove models can be used. For example, focussing and refocussing of thecamera produces easily controllable effects on the models parameters,depending on their depth. In this way the model's depth can beautomatically detected in many important applications, like tronicmicroscope imaging, etc. Also the assumption that the trajectory of thecamera is known, can be avoided.

(c) The construction of virtual 3D-images can be expanded exactly in thesame way to the virtual 3d-structure on videosequences, including freemotion of the represented objects.

(d) Many of the constructions used in a 3D-representation above can beused, with minor modifications, in a compression of videosequences.

(e) The methods of motion detection of the models, given above and in IL103389, can be used with no relation to the further virtual imagesconstruction. For example, by this methods depth detection can beperformed, and moving points on videosequences can be traced. Alsoidentification of points on two images of the same object, can be doneusing the models described above and in IL 103389.

(f) An important feature of the method presented above is that all theconstructions involved are local and straightforward. As a result, anoverall computational complexity of this method is low.

(g) Various 3D-processing operations like ray tracing, light processing,etc., can be performed on virtual 3D-images data.

We claim:
 1. A process for picture representation by data compressionwhich comprises the steps of:subdividing the picture into regions;registering for each region a set of brightness values; fixing for eachregion a characteristic scale in terms of a number of pixels; dividingeach region into cells, each of said cells comprising a number of pixelsdefined by two coordinates, said cells having a linear dimension in theorder of said characteristic scale; identifying in each cell basicstructures chosen from among smooth areas, positive and negative hills,and curvilinear structures chosen from among edges and ridges;constructing for said curvilinear structures geometric models comprisinglines approximating the center lines of said structures and parametersdefining the profiles of said structures; associating to each of saidsmooth areas, positive and negative hills and geometric models ofcurvilinear structures, a mathematical model; condensing saidmathematical models to define a global mathematical model for the cell;quantizing and encoding the data defining said global mathematicalmodel; and storing and/or transmitting said data as representing theprimary compression for the picture; whereas in order to improve thefidelity of picture representations, visual adjacency relations betweensaid basic structures are identified and represented as mathematicaladjacency relations between said mathematical models.
 2. Processaccording to claim 1, wherein a background partition takes into accountsaid adjacency relations.
 3. Process according to claim 1, wherein oneof chains and graphs formed by said models join one another at jointsdefined by said adjacency relations.
 4. Process according to claim 1,comprising geometric approximation of curvilinear structures by splinefunctions.
 5. Process according to claim 1, wherein basic structures aredefined by approximating the brightness function by the second degreepolynomials on 3×3 pixels cells and by the third degree polynomials on5×5 pixels cells.
 6. A process of picture representation by datacompression which comprises the steps of:compressing pictures by theprocess according to claim 1; forming a difference picture between theoriginal one and the compressed one; compressing said difference pictureby a compression method; adding said compressed difference picture tosaid originally compressed picture.
 7. A process of compressedrepresentation of three-dimensional scenes, which comprises the stepsof:producing one or several compressed images of the 3-D scene by aprocess according to claim 1; associating to each said model of saidcompressed image or images a depth value; storing and/or transmittingdata defining each model, including said depth values, said datarepresenting the primary compression of the 3-D scene; producing theview of said 3-D scene from any prescribed point by processing of saiddata.
 8. Process according to claim 7, comprising further compressing ofsaid primary compressed data by further quantization and loss-lesscompression.
 9. Process according to claim 7, comprising interactivedepth creation.
 10. Process according to claim 7, comprising automaticdepth creation.
 11. Process according to claim 7, comprising applicationof 3-D image processing operations to said compressed data.
 12. Processaccording to claim 7, wherein said images of said 3-D scene form aninterrelated net.
 13. A process of compressed representation of 3-Dscenes, which comprise the steps of:producing one or severalrepresentations of the 3-D scene by the process according to claim 7;and combining the data of said representations to form local 3-D models.14. Process according to claim 13, comprising rendering of the data bytransforming said data to one said representation.
 15. A process ofrepresentation and fast rendering of combined 3-D texture data,comprising the steps of:compressing texture data by the processaccording to claim 1; and performing rendering operations on saidcompressed data.
 16. A process of fast rendering of combined 3-D texturedata, wherein the rendering is performed by transforming said data intorepresentation according to claim
 7. 17. A process of depth detection,comprising the steps of:compressing several images of the 3-D scene bythe process according to claim 1; comparing the parameters of theresulting corresponding models; and computing the depth through thedifference in said parameters.
 18. Process according to claim 17,wherein images are fixed from different positions and the correspondencebetween the resulting models is produced by motion detection. 19.Process according to claim 17, wherein said images differ from oneanother by the camera focusing.
 20. A process of points identificationon different images of the same 3-D scene, which comprises the stepsof:comprising said images by the process according to claim 1; andanalyzing the resulting models and identifying the corresponding points.21. Process according to claim 20, wherein given points are traced invideo sequences.
 22. Process according to claim 6, wherein thecompression method is one of a wavelets method and a JPEG method.